Disfluency and Out-Of-Vocabulary Word Processing in Arabic Speech Understanding

نویسندگان

  • Younès Bahou
  • Lamia Hadrich Belguith
  • Abdelmajid Ben Hamadou
چکیده

The disfluencies inherent in spontaneous speaking and out-of-vocabulary words omnipresent in any transcribed oral utterance by speech recognition, are a real challenge for speech understanding systems. Thus, we propose in this paper, a method for processing disfluencies and out-ofvocabulary words in the context of automatic Arabic speech understanding. Our method based on a robust and partial analysis of Arabic oral utterances (conceptual segments analysis) is effective for the treatment of such phenomena. This method has been tested through the understanding module of SARF system, an interactive vocal server for Tunisian railway information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Language Modeling with Stem-derived Morphemes for Automatic Speech Recognition

The goal of this dissertation is to introduce a method for deriving morphemes from Arabic words using stem patterns, a feature of Arabic morphology. The motivations are three-fold: modeling with morphemes rather than words should help address the out-ofvocabulary problem; working with stem patterns should prove to be a cross-dialectally valid method for deriving morphemes using a small amount o...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

The Impact of Teachers' Training on the Reliability of Tests and Assessments in Governmental and Non-governmental Sections

Assessment is considered as one of the fundamental elements in the field of foreign language acquisition. In order for communication take place, adequate number of vocabulary is needed to be known by the learners. The salient role of vocabulary in the field of foreign language acquisition resulted in the publication of several hundreds of papers and dozens of books. Due to the dominant role of ...

متن کامل

Tight Integration of Speech Disfluency Removal into SMT

Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluen...

متن کامل

Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages

We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009